---
title: "README for 'The Benefits from Bundling Demand in K-12
Broadband Procurement'"
author: "Gaurab Aryal, Charles Murry, Pallavi Pal and Arnab Palit"
date: "`r Sys.Date()`"
output: 
  html_document:
    toc: true
    toc_float: true
    number_sections: true
    theme: readable

---

# Overview

This replication package contains all code necessary to replicate the tables and figures in "The Benefits from Bundling Demand in K-12 Broadband Procurement". The analysis consists of two main components:

1. **Simulation Analysis** (Section 2): MATLAB code to simulate bids with and without bundling 
2. **Empirical Analysis**  (Sections 4, 5, 6 and Appendix A): R code processes data and produces all empirical results

The simulation (in MATLAB) takes approximately 2-3 hours to run. The estimation and analysis (in R) takes <1 minute on a standard desktop computer.

---

## Data Availability and Provenance Statements

### Summary of Availability

- [ ] Some data **cannot be made** publicly available.
- [x] Confidential data used in this paper will be preserved in accordance with journal policies.

### Statement about Rights

- [x] We certify that the author(s) of the manuscript have legitimate access to and permission to use the data used in this manuscript.

### Data Sources

#### ESCNJ (Education Services Commission of New Jersey) Data

The paper uses survey data on K-12 school broadband procurement from the Education Services Commission of New Jersey (ESCNJ). This data includes detailed contract information for schools in 2014 and 2015, including prices, bandwidth, vendors, and contract terms.

**Access:** The processed data are available.

**Datafile:** `1_Input/df_analysis.RData` (provided in public replication package)

#### FCC Fixed Broadband Deployment Data

The paper uses publicly available data from the FCC Fixed Broadband Deployment database to construct measures of ISP competition at the Census block level.

**Source:** Federal Communications Commission (FCC)  
**Access:** Freely available at https://www.fcc.gov/general/broadband-deployment-data-fcc-form-477  
**Citation:** Federal Communications Commission. "Fixed Broadband Deployment Data." 

**Datafile:** Used to construct competition measures merged into `df_analysis.RData`

### License for Data

The ESCNJ and the FCC data are in the public domain and can be redistributed with proper attribute.

---

## Computational Requirements

### Software Requirements

#### MATLAB
- MATLAB R2018a or later
- No additional toolboxes required
- Tested on MATLAB R2024b

#### R
- R version 4.0.0 or later (code last run with R 4.3.1)
- Required packages (automatically installed by script):
```r
  readr, dplyr, tibble, feather, stringr, ggplot2,
               readxl, plm, magrittr, stargazer, maxLik,
               gtsummary, corrr, tidyr, fixest, janitor,
               kableExtra, huxtable, scales, ggthemes, broom, 
               margins, patchwork, rstudioapi
```

The R script `main.R` will install missing packages using `pacman`.

### Controlled Randomness

- [x] Random seed is set at line 4 of MATLAB program `main.m` (`rng(666)`)
- [x] No pseudo-random number generator is used in the R analysis

### Memory, Runtime, Storage Requirements

#### Summary

**Time to reproduce:**
- [x] 2-3 hours (primarily due to MATLAB simulation)
- MATLAB simulation: 2-3 hours
- R analysis: < 1 minute

**Storage required:**
- [x] < 5 MB (code and simulated data only)
- Additional <200 KB with ESCNJ data

#### Computational Details

The code was last run on:
- **MATLAB:** MacOS 15.6.1, Apple M1 Max processor, 64GB RAM
- **R:**      MacOS 15.6.1, Apple M1 Max processor, 64GB RAM

---

## Description of Programs/Code

### Directory Structure
```
K-12_replication/
├── 1_Simulation/
│   └── main.m                    # MATLAB simulation code
|   └── convolution_cdf.m         # MATLAB helper function for convolution  
├── 2_Estimation/
│   ├── 1_Input/
│   │   └── df_analysis.RData     # main data (provided)
│   ├── 2_Code/
│   │   ├── main.R                # Main R analysis script
│   │   └── utils.R               # Utility functions for tables/figures
│   └── 3_Output/
│       ├── tables/               # Generated tables (.tex files)
│       └── figures/              # Generated figures (.eps)
```

### Program Descriptions

#### 1_Simulation/main.m (MATLAB)

**Purpose:** Generates bids with and without bundling 

**Outputs:**
- Saves simulation results to `Bids.mat`
- Figure 1 (a) and (b): Scatter plots comparing bids pre- and post-bundling
- Table 1: Average bids across different parameter combinations (reported in text)


**Runtime:** Approximately 2-3 hours

**Key parameters:**
- `L = 1000`: Number of Monte Carlo simulations
- `Gamma = [4, 8, 12, 16]`: Cost complementarity parameters
- `ns = [2, 3, 5, 10]`: Number of bidders

#### 2_Estimation/2_Code/main.R (R)

**Purpose:** Processes data and generates summary table and all empirical tables and figures

**Outputs:**

**Tables** (saved to `3_Output/tables/`):

- `tab_sumstats.tex` → Table 2: Summary Statistics
- `did_results.tex` → Table 3: Main DiD Results  
- `did_results_catAD.tex` → Table 4: Results by Category A/D
- N/A → Table 5: Consortium Bidding Competition (hard-coded in paper from external data)
- `did_competition_bound_catD.tex` → Table 6: Price Effect of Competition 
- `did_robustness_69.tex` → Table 7: Robustness Checks: Alternative Samples
- N/A → Table 8: Bounds on the Expenditure Savings (hard-coded in paper from data)
- `tab_isp_2014_2015.tex` → Table A.1: ISP Market Presence by Region
- `tab_sumstats_region.tex` → Table A.2: Summary Stats by Region

**Figures** (saved to `3_Output/figures/`):

- `bounds_price.eps` → Figure 3: Price Effect Sensitivity
- `bounds_bandwidth.eps` → Figure 3: Bandwidth Effect Sensitivity  
- `price_difference_scatter.pdf` → Figure 4: Price Difference Between 2015 Contract and Winning Bid for Control Schools
- `Manski_Expenditure.pdf` → Figure 5: Bounds on Savings under the Violation in Parallel Trends
- `CS_interval_participating.pdf` → Figure 6(a): Bounds for Change in Welfare (Participants)
- `CS_interval_NOT_participating.pdf` → Figure 6(b): Bounds for Change in Welfare (Non-participants)
- `combined_welfare_bounds.pdf` → Figure 7: Estimated Bounds for Change in School Welfare



| `tab_sumstats_region.tex` | Table A.2: Summary Stats by Region |
**Runtime:** <5 minutes

**Dependencies:**
- `utils.R`: Contains helper functions for creating formatted tables and figures
- `df_analysis.RData`: Input data (provided)

#### 2_Estimation/2_Code/utils.R

**Purpose:** Utility functions for creating professionally formatted LaTeX tables and ggplot figures

**Key functions:**
- `calculate_sumstats()`: Computes summary statistics
- `create_latex_table()`: Generates formatted regression tables
- `create_professional_plot()`: Creates publication-quality plots
- Additional helper functions for specific table formats

### License for Code

The code is licensed under an MIT license. See LICENSE.txt for details.

---

## Instructions to Replicate

### Setup

1. **Directory Structure:** Ensure the folder structure matches the layout described above

2. **Software Installation:**
   - Install MATLAB R2018a or later
   - Install R 4.0.0 or later
   - Install RStudio (recommended for easier path management)

3. **R Package Installation:**
   - Open `2_Estimation/2_Code/main.R` in RStudio
   - The script will automatically install all required packages on first run

### Replication Steps

#### Part 1: Simulation Analysis (MATLAB)

1. Open MATLAB
2. Navigate to `1_Simulation/` directory
3. Run `main.m`
   - This will take approximately 2-3 hours
   - Generates Figure 1 and data for Table 1
   - Saves intermediate results to `Bids.mat`

**Output files:**
- `bundling_comparison_all_bids_n2.pdf` → Figure 1(a)
- `bundling_comparison_winning_bids_n2.pdf` → Figure 1(b)
- `Bids.mat` (simulation results)
-  Table 1 in the text. 

#### Part 2: Empirical Analysis (R)

**Note:** This requires access to the data file `df_analysis.RData`. 

1. **Set up data:**
   - place `df_analysis.RData` in `2_Estimation/1_Input/`

2. **Configure paths:**
   - Open `2_Estimation/2_Code/main.R` in RStudio
   - Manual path configuration may be needed 

3. **Run analysis:**
   - Open `2_Estimation/2_Code/main.R` 
   - Source `2_Estimation/2_Code/main.R`
   - Script will run automatically from start to finish

**Output:**
- All tables saved to `2_Estimation/3_Output/tables/`
- All figures saved to `2_Estimation/3_Output/figures/`

### Important Notes on Table Formatting

**Manual Edits:** Tables in the published paper differ slightly from the automatically generated `.tex` files:

1. **Column Numbering:** Tables 3-7 use consecutive column numbering across tables (e.g., if Table 3 has columns 1-2, Table 4 starts with column 3). This sequential numbering was added manually to the `.tex` files after generation.

2. **Cosmetic Changes:** Minor formatting adjustments were made to table layouts (spacing, alignment) for journal submission, but **all numerical values are exactly as generated by the code**.

3. **Table 5:** The Consortium bidding competition table (Table 5) is hard-coded in the paper from the data. 

4. **Table 6:** The Price Effects Controlling for Consortium Competition table (Table 6) displays coefficients for only two variables, even though the code generates additional coefficients. 


5. **Table 8:** Bounds on the Expenditure Savings is hard-coded in the paper using the output from the code.

**To match published tables exactly:**
- Use the generated `.tex` files for all numerical values
- Apply manual column renumbering as described above
- Refer to the published paper for final formatting details

---

## List of Tables and Programs

### Tables

| Table # | Description | Program | Output File | Notes |
|---------|------------|---------|-------------|-------|
| Table 1 | Simulation Results Summary | `1_Simulation/main.m` (lines 189, 258-285) | (reported in text from `Bids.mat`) | Average bids computed from simulation |
| Table 2 | Summary Statistics | `2_Estimation/2_Code/main.R` (lines 147-268) | `tab_sumstats.tex` | |
| Table 3 | Main DiD Results | `2_Estimation/2_Code/main.R` (lines 272-327) | `did_results.tex` | Column numbering edited manually |
| Table 4 | Results by Category | `2_Estimation/2_Code/main.R` (lines 333-375) | `did_results_catAD.tex` | Column numbering edited manually |
| Table 5 | Consortium Bidding | (hard-coded in text) | N/A | From data (Section 5.2.2) |
| Table 6 | Competition Bounds | `2_Estimation/2_Code/main.R` (lines 380-433) | `did_competition_bound_catD.tex` | Column numbering edited manually and coefficients for only two variables are displayed|
| Table 7 | Robustness Checks | `2_Estimation/2_Code/main.R` (lines 545-657) | `did_robustness_69.tex` | Column numbering edited manually |
| Table 8 | Expenditure Savings | `2_Estimation/2_Code/main.R` (lines 706-775) | N/A | Created using output from the code |
| Table A.1 | ISP Market Presence | `2_Estimation/2_Code/main.R` (lines 1250-1285) | `tab_isp_2014_2015.tex` | |
| Table A.2 | Regional Summary Stats | `2_Estimation/2_Code/main.R` (lines 1287-1415) | `tab_sumstats_region.tex` | |

### Figures

| Figure # | Description | Program | Output File | Notes |
|----------|------------|---------|-------------|-------|
| Figure 1(a) | All Bids Comparison | `1_Simulation/main.m` (lines 30-116) | `bundling_comparison_all_bids_n2.pdf` | |
| Figure 1(b) | Winning Bids | `1_Simulation/main.m` (lines 118-185) | `bundling_comparison_winning_bids_n2.pdf` | |
| Figure 2 | (Descriptive/Conceptual) | N/A |  | Not generated by code |
| Figure 3 | Price and Bandwidth Effect Sensitivity | `2_Estimation/2_Code/main.R` (lines 438-541) | `bounds_price.eps` and `bounds_bandwidth.eps`| |
| Figure 4 | Price Difference Between 2015 Contract and Winning Bid for Control Schools | `2_Estimation/2_Code/main.R` (lines 662-696) | `price_difference_scatter.pdf` | |
| Figure 5 | Bounds on Savings under the Violation in Parallel Trends | `2_Estimation/2_Code/main.R` (lines 777-891) | `Manski_Expenditure.pdf` | |
| Figure 6(a, b) | Welfare (Participants and Non-participants) | `2_Estimation/2_Code/main.R` (lines 893-1068) | `CS_interval_participating.pdf` and `CS_interval_NOT_participating.pdf` | |
| Figure 7 | Estimated Bounds for Change in School Welfare | `2_Estimation/2_Code/main.R` (lines 1070-1238) | `combined_welfare_bounds.pdf` |  |

---

## References

Federal Communications Commission. "Fixed Broadband Deployment Data from FCC Form 477." https://www.fcc.gov/general/broadband-deployment-data-fcc-form-477

Education Services Commission of New Jersey (ESCNJ). "K-12 Broadband Procurement Data, 2014-2015." Administrative data accessed under data use agreement.

---

## Acknowledgements

This README follows the template provided by the [Social Science Data Editors](https://social-science-data-editors.github.io/guidance/template-README.html) and has been adapted for this project's specific requirements.

---

**Contact Information**

For questions about replication contact the authors.

Last updated: `r Sys.Date()`